Network management and control using collaborative on-line simulation

ABSTRACT

A collaborative on-line simulation system and method to provide automated and pro-active control functions for computer network. In a wide area network, clients communicate through one or several nodes ( 108 ). Each node ( 108 ) contains routers which include control plane ( 202 ) and data plane ( 204 ). Collaborative on-line simulators ( 206 ) are interfaced to the network nodes ( 108 ) and continuously monitor the surrounding network conditions, communicate with other simulators and execute collaborative on-line simulation. Based on the simulation results, the on-line simulators ( 206 ) continuously tune selected network parameters to a more efficient operation point to fit the current network conditions.

BACKGROUND OF THE INVENTION

I. Field of the Invention

The present invention generally relates to computer network datamanagement and control. In particular, the present invention relates toproviding a system and method to improve computer network control byproviding real-time tuning of the network for better performance.

II. Description of the Related Art

As the Internet and other available global network data transfermechanisms become increasingly in demand, network traffic over thesedata networks has become problematic. The number of data packet losses,requiring packet re-transmission, as well as the failure of networkcomponents has caused networks to experience reduced data transfer ratesand, in many cases, network failure due to inefficient networkmanagement. Network management involves the collection of data from thenetwork using protocols like SNMP. There are few tools that innovativelyinterpret this data to predict network faults.

Conventional network simulators are used for network design, and in somecases network planning, in order to design more efficient networks tohandle today's increasing demands. These conventional simulators are notused for on-line network control, but rather run in an experimentalsetting using a representative sample of the network data or a model ofthe network structure to develop better protocols and mechanisms totransfer data. In addition, conventional simulators are not efficient.

These conventional simulators are now becoming less efficient becausetoday's networks data loads and operating conditions vary greatly overtime. In order to maintain a more efficient network, there is a need fora mechanism to configure computer networks by using live data wherechanges in the configuration can be implemented in real-time.

SUMMARY OF THE INVENTION

The present invention provides a collaborative on-line simulation systemand method to provide automated and pro-active control functions forcomputer networks. The system and method introduce autonomous on-linesimulators into local networks. These autonomous on-line simulatorscontinuously monitor the surrounding network conditions, collectrelevant network parameter information, communicate with othersimulators and execute collaborative on-line simulation. Based on thesimulation results, the on-line simulators then continuously tuneselected network parameters to an efficient operation point to fit thecurrent network conditions.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other advantages and features of the invention willbecome more apparent from the detailed description of exemplaryembodiments provided below with reference to the accompanying drawingsin which:

FIG. 1 illustrates a model of network nodes of a wide area computernetwork;

FIG. 2 illustrates a network node of a local area network of FIG. 1interfaced to the collaborative on-line simulation system of the presentinvention;

FIG. 3 illustrates the structure of the collaborative on-line simulationsystem of FIG. 2;

FIG. 4( a) illustrates a flowchart of the hybrid parameter searching ofthe present invention;

FIG. 4( b) illustrates the Farm-Worker structure of the collaborativeon-line simulation system of the present invention; and

FIG. 5 illustrates a processor-based system which incorporates thecollaborative on-line simulation system and method of the presentinvention.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to the drawings, where like reference numerals designatelike elements, there is shown in FIG. 1 a wide area network 100 for usein various applications, i.e. communications, Internet web page hosting,etc., including local networks 102, 104 and 106. Each of the localnetworks 102, 104, 106 include at least one server containingprocessors, databases, mainframes and other equipment used to distributedata to multiple clients, the users, connected by network nodes 108through inter-connections 110 and 112. The clients exchange data amongstclients within the same local area network 102, 104 and 106 as well aswith those in other local networks 102, 104 and 106 through one orseveral nodes 108. The nodes 108 are inter-connected to conventionalwide area network backbone hardware/software. For example the wide areanetwork can be the Internet.

As shown in FIG. 2, each network node 108 contains routers 201 whichinclude a control plane 202 and data plane 204. The control plane 202and data plane 204 are essentially two separate communication paths usedto pass control data, where the data is traveling (protocolinformation), and the data itself respectively. Data is transmitted onthe data plane 204 and control signals for network parameters (e.g.protocol parameters) are transmitted on the control plane 202. Thecontrol plane 202 and data plane 204 are connected amongst the severalnetwork nodes 108 through inter-connections 110 and 112 to form localnetworks 102, 104, 106 and ultimately a wide area network 100.

Collaborative on-line simulators 206 are interfaced to the network nodes108 within the local networks 102, 104 and 106. The on-line simulators206 continuously monitor the surrounding network conditions, collect therelevant information, e.g. on-line protocol parameters through on-linetraffic, and exchange information with other simulators by sendinginformation, including advised parameter setting along line 212 throughcontrol plane 202. Based on the information received, simulations areexecuted by the on-line simulators 206 and parameter search methods areused to evaluate the results of the simulations and search for betternetwork parameters.

The use of simulators as well as the use of various types of simulationsby the simulators are well known in the art. However, conventionally theresults of these simulations are not used to change network parametersin real-time because the results would be unreliable due the changingconditions of the wide area network 100 and the large number of networknode 108 experiencing different conditions. The present invention usesconventional simulations but enables the each on-line simulators 206 touse input from the output (results) of another on-line simulator 206 toperform the simulations. In this way the results of each individualon-line simulator 206 are more reliable because each result based uponcurrent and future network conditions. Thus, the present inventionallows the network parameters to continuously change in real-time andhave a net overall improvement on the wide area network 100. Thus, adynamic and automatic network control can be achieved. Note that theabove on-line simulators 206 interact with the control plane 202.Therefore, the on-line simulators 206 actually accomplish a second-ordercontrol over the wide area network 100. In other words, the on-linesimulation merely prescribes parameters required for the operation ofnetwork protocols and does not interfere with their normal operation inany other way.

FIG. 3 is a block diagram illustrating the architecture of thecollaborative on-line simulators 206. Each on-line simulator 206includes a monitor and modeling unit 302, experiment design unit 306,management interface unit 304 and experiment execution unit 308. Thisabove units may be implemented in software and executed at network nodes108.

The monitor and on-line modeling unit 302 continually collectsinformation about the local network (e.g., network topology, trafficconditions, etc.) and tries to build the most updated network model torepresent current network conditions for use during simulation. Themanagement interface unit 304 is the control center of the on-linesimulator 206. The management interface unit 304 controls andsynchronizes the operation of all the other units within the on-linesimulator 206 while serving as an interface of the on-line simulator 206with the network nodes 108 through which it is connected along lines 110and 112 (FIG. 1). The experiment design unit 306 is responsible forsetting up simulation experiments with appropriate search techniques(explained below), and analyzing the results of the simulationexperiments to perform further searches to find more efficient networkparameter settings, if necessary. The experiment execution unit 308executes the simulations received from the experiment design unit 306and returns the results to the experiment design unit 306.

Besides interacting with the local network 102, 104, 106, each on-linesimulator 206 also communicates with other simulators and exchanges therelevant network parameter information, such as network traffic modelsand efficient network parameters. Thus, a collaborative, scalableon-line simulation network is formed. The network is scalable in thatwith the addition of each additional network node 108 additionalcollaborative on-line simulators 206 may be added which will work inconjunction with previously existing simulators to change networkparameters in real-time. Through this, each of the local on-linesimulators 206 acquires a global view of the network and thus is able toperform better network simulation and control.

Since the network conditions keep changing all the time, the on-linesimulation system and method also requires a fast experiment designmethod to quickly finish the simulation experiments and find efficientnetwork parameter settings before the underlying network informationbecomes stale. The goal is to use as few experiments as possible to findas efficient a parameter setting as possible. Note that the emphasis isnot on seeking the optimum setting. Instead, a best-effort strategy isadopted to find a better operating point within a limited time frame.Thus, the search and simulations can be interrupted at any time andstill produce a result better than the starting point. This provides thepossibility to make a compromise between the quality of the result andthe search time to obtain the result. In a preferred embodiment, theRandom Early Drop (RED) queuing management algorithm is used as theunderlying network algorithm to be adjusted because of its sensitivityto parameter settings.

To accomplish a speedy result, the present invention implements atwo-part hybrid search method in the concerned parameter space as shownin flowchart 4(a).

First a high level pruning step occurs. The search space is probed todetermine the important parameters which will have the most effect onnetwork performance (step 470). After pruning part of the search spaceby ignoring less important parameters, those remaining parameters aresearched in more detail (step 472).

In a preferred embodiment, the high level pruning occurs as follows. Thesearch space is probed by the on-line simulators 206 conductingsimulations in portions of the parameter space, specifically theboundaries of the space. These simulations will be based upon a 2^(k)full factorial experiment design. 2^(k) full factorial design is knownin the art of performance analysis.

2^(k) full factorial design examples all possible combinations of theparameter boundaries and fits the parameter boundary results into anon-linear regression model. The model analyzes the importance ofdifferent parameters. The above method is not a iterative method.Instead, to achieve an increasingly refined result, simulations areordered to form a series of subsets and the first subset is generated byapplying 2^(k−p) fractional factorial design on the parameter space. “P”is the minimum integer satisfying 2^(k−p)≧k, which is required by theregression analysis. 2^(k−p) fractional factorial design is a techniquewhich just executes part of the experiments in 2^(k) full factorialdesign.

By carefully selecting the simulations, the analysis of the parameterimportance can be executed with improved speed with only minimum expenseto accuracy. After finishing a subset of simulations and analyzing thesimulation results, the next larger subset, which is obtained by using2^(k−p+1) fractional factorial design, is analyzed, and so on until all2^(k) simulations are finished. During this process, if the search isinterrupted, the analysis result based on the last subset of simulationsis returned as the “best-so-far” result. Thus, the network still hasbeen tuned for better efficiency.

Second, once the high level pruning is complete, the next task is tosearch the remaining parameter space in detail with state space searchtechniques. Basically, the state space search method includes twoimportant components: exploration and exploitation, and a balancestrategy between them (steps 474 and 476). Exploration encourages thesearch process to examine unknown regions. Exploitation attempts toconverge to a maximum or minimum in the vicinity of a chosen region.

Thus, the hybrid on-line simulation methods and system that implementsthe method use a best-effort strategy in its second-order control, whoseemphasis is not on full optimization, but on continuously andincreasingly moving the system towards a better operating point. Thepresent invention continuously tunes up the underlying operation (albeitat a larger time-scale than their normal operation) and therefore,equips the network management infrastructure with “pro-active”management capabilities.

In another preferred embodiment, simulation execution is sped-up usingparallel execution of the simulations. FIG. 4( b) illustrates a parallelprocessing architecture using a farmer-worker infrastructure. Thefarmer-worker infrastructure of FIG. 4( b) allows for distribution ofmany single-machine simulations across multiple workstations. Thedispatcher 402 is the interface between this distributed simulationexecuter (the “worker”) 406, 408, 410, and the experiment design unit306. All the simulations have to go through this dispatcher 402 whichacts as an interface distributing the simulation to be distributed amongthe workers 406, 408, 410. The farmer 404 is the center of thisinfrastructure, which routes the operations of dispatcher 402 andworkers 406, 408, 410. The farmer 404 may use conventional distributednetwork architecture queuing schemes to distribute and route simulationsamongst the workers 406, 408, 410, where the workers 406, 408, 410 arethe actual simulation executers. The above farmer-worker infrastructurecan use multiple workers 406, 408, 410 for the same experiment designunit 306 to speed up the simulation process. In a preferred embodiment,all the communication in this scheme is through TCP connections.Therefore, the dispatchers 402, farmer 404, and workers 406, 408, 410,can be located anywhere in the network. Thus, experiments can be evenlydistributed over the whole wide area network and maximize theutilization of the computing resources.

Referring now to FIG. 6, each network node 102, 104 and 106 may containa processor-based system 500 for implementing the above described systemand method. The processor-based system 500 includes a central processingunit (CPU) 502, for example, a microprocessor, that communicates withone or more input/output (I/O) devices 508, 510 over a bus 516 is shown.The processor-based system 500 also includes random access memory (RAM)512, a read only memory (ROM) 514 and may include peripheral devicessuch as a disk drive 504 and CD-ROM drive 506 which also communicateswith CPU 502 over the bus 516. Memory 512 can be configured to store thecollaborative on-line simulation system and method for the presentinvention as described above. It may also be desirable to integrate theprocessor 502 and memory 512 on a single integrated chip.

Hence, the present invention provides a system and method for improvingthe efficiency of a computer network by the use a on-line simulatorswhich execute at least one simulation based upon current networkconditions and tune network parameters in real time.

Although the invention has been described above in connection withexemplary embodiments, it is apparent that many modifications andsubstitutions can be made without departing from the spirit or scope ofthe invention. In particular, although the invention is described withreference to tuning network protocol parameters, the system and methodcan also be applied to other aspects of computer network such asrouting. Likewise, the system can be implemented on a UNIX, LINUX or anyother operating system. Accordingly, the invention is not to beconsidered as limited by the foregoing description, but is only limitedby the scope of the appended claims.

1. A system for improving data network performance, the systemcomprising: a first router coupled to a first local area network; afirst simulator coupled to the first router, the first simulatorconfigured to receive current network parameters from the first router,perform a first network simulation based on the current networkparameters, and transmit to the first router improved network parametersbased on a result of the first simulation, the improved networkparameters improving data network efficiency of the first local areanetwork; a second router coupled to a second local area network, thesecond local area network being coupled to the first local area network;and a second simulator coupled to the second router, the secondsimulator configured to receive current network parameters from thesecond router and the improved network parameters from the firstsimulator, perform a second network simulation based on the currentnetwork parameters received from the second router and the improvednetwork parameters received from the first simulator, and transmit tothe second router and the first simulator additional improved networkparameters based on a result of the second simulation, the additionalimproved network parameter improving data network efficiency of thesecond local area network, wherein the first router is configured toadjust at least one current network operating parameter based on theimproved network parameters and the second router is configured toadjust at least one current network operating parameter based on theadditional improved network parameters.
 2. The system of claim 1,wherein the first and second simulators are configured to use abest-effort strategy to respectively determine the improved networkparameters and additional improved network parameters within a limitedtime frame.
 3. The system of claim 1, wherein the first and secondrouters are configured to implement a Random Early Drop queuingmanagement algorithm and improve the efficiency of the algorithm basedrespectively on the improved network parameters and the additionalimproved network parameters.
 4. The system of claim 1, wherein the firstand second simulators are configured to perform a high level pruningstep to determine which of the current network parameters receivedrespectively from the first and second routers will have the most effecton network performance.
 5. The system of claim 1, wherein the first andsecond simulators are configured to implement 2^(k) full factorialexperiment design.
 6. The system of claim 1, wherein the first simulatorcomprises software stored in a memory of and configured to be executedon a node of the first local area network and the second simulatorcomprises software stored in a memory of and configured to be executedon a node of the second local area network.
 7. The system of claim 1,wherein the first and second routers are configured to continuouslyupdate current network parameters of their respective local areanetworks in real-time based on simulation results from the first andsecond simulators, respectively.
 8. The system of claim 1, wherein thefirst and second simulators are computers coupled respectively to thefirst and second local area networks.
 9. A method for improvingperformance of a data network, the method comprising: collecting currentnetwork parameters of a first local network by a first router coupled tothe first local network; performing a first simulation by a firstsimulator coupled to the first local network based on the currentnetwork parameters of the first local network; replacing at least onecurrent network parameter of the first local network with an improvednetwork parameter based on a result of the first simulation, theimproved network parameters improving data network efficiency of thefirst local network; collecting current network parameters of a secondlocal network by a second router coupled to the second local network;performing a second simulation by a second simulator coupled to thesecond local network based on the current network parameters of thesecond local network and a result of the first simulation; replacing atleast one current network parameter of the second local network with animproved network parameter based on a result of the second simulation,the improved network parameter improving data network efficiency of thesecond local network.
 10. The method of claim 9, further comprisingtransmitting improved network parameters from the first simulator to thefirst router based on a result of the first simulation.
 11. The methodof claim 10, further comprising transmitting improved network parametersfrom the second simulator to the second router and the first simulatorbased on a result of the second simulation.